智能论文笔记

Artificial Text Detection with Multiple Training Strategies

Bin Li , Yixuan Weng , Qiya Song , Hanjun Deng

分类：自然语言处理 | 人工智能

2022-12-10

As the deep learning rapidly promote, the artificial texts created by generative models are commonly used in news and social media. However, such models can be abused to generate product reviews, fake news, and even fake political content. The paper proposes a solution for the Russian Artificial Text Detection in the Dialogue shared task 2022 (RuATD 2022) to distinguish which model within the list is used to generate this text. We introduce the DeBERTa pre-trained language model with multiple training strategies for this shared task. Extensive experiments conducted on the RuATD dataset validate the effectiveness of our proposed method. Moreover, our submission ranked second place in the evaluation phase for RuATD 2022 (Multi-Class).

translated by 谷歌翻译

GenLoco: Generalized Locomotion Controllers for Quadrupedal Robots

Gilbert Feng , Hongbo Zhang , Zhongyu Li , Xue Bin Peng , Bhuvan Basireddy , Linzhu Yue , Zhitao Song , Lizhi Yang , Yunhui Liu , Koushil Sreenath

分类：机器人 | 机器学习

2022-09-12

近年来，商业上可用和负担得起的四足动物机器人激增，其中许多平台在研究和行业中都被积极使用。随着腿部机器人的可用性的增长，对这些机器人能够执行有用技能的控制器的需求也是如此。但是，大多数用于控制器开发的基于学习的框架都集中在培训机器人特定的控制器上，该过程需要为每个新机器人重复。在这项工作中，我们引入了一个用于训练四足机器人的广义运动（Genloco）控制器的框架。我们的框架合成了可以部署在具有相似形态的各种四足动物的机器人上的通用运动控制器。我们提出了一种简单但有效的形态随机化方法，该方法在程序上生成了一组训练的模拟机器人。我们表明，通过对这套模拟机器人进行训练，我们的模型获得了更多的通用控制策略，这些策略可以直接转移到具有多种形态的新型模拟和真实世界机器人中，在训练过程中未观察到。

translated by 谷歌翻译

Product Re-identification System in Fully Automated Defect Detection

Chenggui Sun , Li Bin Song

分类：计算机视觉

2021-12-20

在这项工作中，我们介绍了一种方法，并提出了一种改进的神经工作，以执行产品重新识别，这是全自动产品缺陷检测系统的必要核心功能。我们的方法基于特征距离。它是特征提取神经网络的组合，如vgg16，alexnet，带图像搜索引擎 - vearch。我们用于开发产品重新识别系统的数据集是一个水瓶数据集，由400种液体瓶装组成。这是一个小型数据集，这是我们工作的最大挑战。然而，与vearch的神经网络的组合显示了解决产品重新识别问题的可能性。特别是，我们的新神经网络 - 基于AlexNet改进的神经网络的AlphaalexNet可以通过四个百分点提高生产识别准确性。这表明当可以引入和重新设计的高效特征提取方法时，可以实现理想的生产识别精度，以用于几乎相同产品的图像特征提取。为了解决由数据集的小尺寸造成的最大挑战以及识别彼此几乎没有差异的产品的困难性质。在我们未来的工作中，我们提出了一种新的路线图来解决几乎 - 相同的生产标识：介绍或开发需要很少的图像以训练自己的新算法。

translated by 谷歌翻译

MGTAB: A Multi-Relational Graph-Based Twitter Account Detection Benchmark

Shuhao Shi , Kai Qiao , Jian Chen , Shuai Yang , Jie Yang , Baojie Song , Linyuan Wang , Bin Yan

分类：计算机视觉

2023-01-03

The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.

translated by 谷歌翻译

HGAN: Hierarchical Graph Alignment Network for Image-Text Retrieval

Jie Guo , Meiting Wang , Yan Zhou , Bin Song , Yuhao Chi , Wei Fan , Jianglong Chang

分类：计算机视觉

2022-12-16

Image-text retrieval (ITR) is a challenging task in the field of multimodal information processing due to the semantic gap between different modalities. In recent years, researchers have made great progress in exploring the accurate alignment between image and text. However, existing works mainly focus on the fine-grained alignment between image regions and sentence fragments, which ignores the guiding significance of context background information. Actually, integrating the local fine-grained information and global context background information can provide more semantic clues for retrieval. In this paper, we propose a novel Hierarchical Graph Alignment Network (HGAN) for image-text retrieval. First, to capture the comprehensive multimodal features, we construct the feature graphs for the image and text modality respectively. Then, a multi-granularity shared space is established with a designed Multi-granularity Feature Aggregation and Rearrangement (MFAR) module, which enhances the semantic corresponding relations between the local and global information, and obtains more accurate feature representations for the image and text modalities. Finally, the ultimate image and text features are further refined through three-level similarity functions to achieve the hierarchical alignment. To justify the proposed model, we perform extensive experiments on MS-COCO and Flickr30K datasets. Experimental results show that the proposed HGAN outperforms the state-of-the-art methods on both datasets, which demonstrates the effectiveness and superiority of our model.

translated by 谷歌翻译

Dilation-Erosion for Single-Frame Supervised Temporal Action Localization

Bin Wang , Yan Song , Fanming Wang , Yang Zhao , Xiangbo Shu , Yan Rui

分类：计算机视觉

2022-12-13

To balance the annotation labor and the granularity of supervision, single-frame annotation has been introduced in temporal action localization. It provides a rough temporal location for an action but implicitly overstates the supervision from the annotated-frame during training, leading to the confusion between actions and backgrounds, i.e., action incompleteness and background false positives. To tackle the two challenges, in this work, we present the Snippet Classification model and the Dilation-Erosion module. In the Dilation-Erosion module, we expand the potential action segments with a loose criterion to alleviate the problem of action incompleteness and then remove the background from the potential action segments to alleviate the problem of action incompleteness. Relying on the single-frame annotation and the output of the snippet classification, the Dilation-Erosion module mines pseudo snippet-level ground-truth, hard backgrounds and evident backgrounds, which in turn further trains the Snippet Classification model. It forms a cyclic dependency. Furthermore, we propose a new embedding loss to aggregate the features of action instances with the same label and separate the features of actions from backgrounds. Experiments on THUMOS14 and ActivityNet 1.2 validate the effectiveness of the proposed method. Code has been made publicly available (https://github.com/LingJun123/single-frame-TAL).

translated by 谷歌翻译

Task-Oriented Image Transmission for Scene Classification in Unmanned Aerial Systems

Xu Kang , Bin Song , Jie Guo , Zhijin Qin , F. Richard Yu

分类：计算机视觉

2021-12-21

事物互联网的蓬勃发展使得能够将其计算和存储能力扩展到计算空中系统中的任务，其中云和边缘协作，特别是对于基于深度学习（DL）的人工智能（AI）任务。收集大量图像/视频数据，无人驾驶飞行器（UAV）由于其存储和计算能力有限，只能将智能分析任务切换到后端移动边缘计算（MEC）服务器。如何有效地传输AI模型的最相关信息是一个具有挑战性的主题。灵感来自近年来的任务型沟通，我们提出了一个新的空中图像传输范例，用于场景分类任务。在前端UAV上开发了轻量级模型，用于语义块传输，具有对图像和信道条件的看法。为了实现传输延迟和分类准确性之间的权衡，深增强学习（DRL）用于探索在各种信道条件下对后端分类器具有最佳贡献的语义块。实验结果表明，与固定传输策略和传统的内容感知方法相比，该方法可以显着提高分类准确性。

translated by 谷歌翻译

Stochastic Planner-Actor-Critic for Unsupervised Deformable Image Registration

Ziwei Luo , Jing Hu , Xin Wang , Shu Hu , Bin Kong , Youbing Yin , Qi Song , Xi Wu , Siwei Lyu

分类：人工智能 | 计算机视觉

2021-12-14

由不同形状和非线性形状变化引起的机器官的大变形，对医学图像配准产生了重大挑战。传统的注册方法需要通过特定变形模型迭代地优化目标函数以及细致的参数调谐，但在具有大变形的图像中具有有限的能力。虽然基于深度学习的方法可以从输入图像到它们各自的变形字段中的复杂映射，但它是基于回归的，并且容易被卡在局部最小值，特别是当涉及大变形时。为此，我们呈现随机策划者 - 演员 - 评论家（SPAC），这是一种新的加强学习框架，可以执行逐步登记。关键概念通过每次步骤连续地翘曲运动图像，以最终与固定图像对齐。考虑到在传统的强化学习（RL）框架中处理高维连续动作和状态空间有挑战性，我们向标准演员 - 评论家模型引入了一个新的概念“计划”，这是低维度，可以促进演员生成易于高维行动。整个框架基于无监督的培训，并以端到端的方式运行。我们在几个2D和3D医学图像数据集上评估我们的方法，其中一些包含大变形。我们的经验结果强调了我们的工作实现了一致，显着的收益和优于最先进的方法。

translated by 谷歌翻译

Stochastic Actor-Executor-Critic for Image-to-Image Translation

Ziwei Luo , Jing Hu , Xin Wang , Siwei Lyu , Bin Kong , Youbing Yin , Qi Song , Xi Wu

分类：计算机视觉

2021-12-14

训练无模型的深度加强学习模型来解决图像到图像转换是困难的，因为它涉及高维连续状态和动作空间。在本文中，我们借鉴了最近的最大熵增强学习框架成功的灵感来设计用于挑战连续控制问题，在包括图像表示，产生和控制的高维连续空间上开发随机策略。这种方法的核心是随机演员 - 执行程序 - 批评者 - 评论家（SAEC），这是一个违法的演员 - 评论家模型，具有额外的excator来生成现实图像。具体地，该actor通过随机潜行动作侧重于高级表示和控制策略，以及明确地指示执行器生成用于操纵状态的低级动作。关于若干图像到图像转换任务的实验已经证明了在面对高维连续空间问题时所提出的SAEC的有效性和稳健性。

translated by 谷歌翻译

The Three Stages of Learning Dynamics in High-Dimensional Kernel Methods

Nikhil Ghosh , Song Mei , Bin Yu

分类： (统计)机器学习 | 机器学习

2021-11-13

要了解深度学习的作品，了解神经网络的培训动态至关重要。关于这些动态的几个有趣的假设是基于经验观察到的现象，但存在有限的理论上了解此类现象的时间和原因。在本文中，我们考虑了内核最小二乘目标对梯度流动的培训动态，这是SGD培训的神经网络的限制动态。使用精确的高维渐近学，我们将拟合模型的动态表征在两个“世界”中：在甲骨文世界中，该模型在人口分布和实证世界中培训，模型在采样的数据集上培训。我们展示在内核的温和条件下，$ L ^ 2 $目标回归函数，培训动力学经历三个阶段，其特征在于两个世界的模型的行为。我们的理论结果也在数学上正式化一些有趣的深度学习现象。具体而言，在我们的环境中，我们展示了SGD逐步了解更多复杂的功能，并且存在“深度引导”现象：在第二阶段，尽管经验训练误差要小得多，但两个世界的测试错误仍然接近。最后，我们提供了一个具体的例子，比较了两种不同核的动态，这表明更快的培训不需要更好地推广。

translated by 谷歌翻译